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1. Introduction. 

1.1. Background. Covariance matrix estimation is fundamental for almost all areas 
of multivariate analysis and many other applied problems. In particular, covariance ma- 
trices and their inverses play a central role in risk management and portfolio allocation. 
For example, the smallest and largest eigenvalues of a covariance matrix are related to 
the minimum and maximum variances of the selected portfolio, respectively, and the 
eigenvectors are related to portfolio allocation. Therefore, we need a good covariance 
matrix estimator inverting which does not excessively amplify the estimation error. See 
Goldfarb and Iyengar (2003) for applications of covariance matrices to portfolio selec- 
tions and Johnstone (2001) for their statistical implications. 

Estimating high-dimensional covariance matrices is intrinsically challenging. For 
example, in portfolio allocation and risk management, the number of stocks p, which is 
typically of the same order as the sample size n, can well be in the order of hundreds. 
In particular, when p = 200 there are more than 20,000 parameters in the covariance 
matrix. Yet, the available sample size is usually in the order of hundreds or a few 
thousands because longer time series (larger n) increases modeling bias. For instance, 
by taking daily data of the past three years we have only roughly n = 750. So it is hard 
or even unrealistic to estimate covariance matrices without imposing any structure (see 
the rejoinder in Fan, 2005). 

Factor models have been widely used both theoretically and empirically in economics 
and finance. Derived by Ross (1976, 1977) using the Arbitrage Pricing Theory (APT) 
and by Chamberlain and Rothschild (1983) in a large economy, the multi-factor model 
states that the excessive return of any asset Y{ over the risk-free interest rate satisfies 

(1-1) Yi = bnfi-\ hb iK fK + £i, i = l, •••,£>, 

where /i, • • • , fx are the excessive returns of K factors, fry, i = 1, • ■ • ,p, j = 1, • • • , K, 
are unknown factor loadings, and e\, ■ ■ ■ ,e p are p idiosyncratic errors uncorrelated given 
/l) • ■ ' > Ik- in economics and finance literature, factors are implicitly assumed to be ob- 
servable and there is a large literature contributed to construction of factors (e.g. Fama 
and French, 1992, 1993). The factor models have been widely applied in economics and 
finance. See, for example, Ross (1976, 1977), Engle and Watson (1981), Chamberlain 
(1983), Chamberlain and Rothschild (1983), Diebold and Nerlove (1989), Fama and 
French (1992, 1993), Aguilar and West (2000), and Stock and Watson (2005) and refer- 
ences therein. These are extensions of the famous Capital Asset Pricing Model (CAPM) 
and can be regarded as efforts to approximate the market portfolio in the CAPM. 
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Thanks to the multi- factor model (jl.ip . if a few factors can completely capture the 
cross-sectional risks, the number of parameters in covariance matrix estimation can be 
significantly reduced. For example, using the Fama-French three-factor model [Fama 
and French (1992, 1993)], there are 4p instead of p(p + l)/2 parameters to be estimated. 
Despite the popularity of factor models in the literature, the impact of dimensionality on 
the estimation errors of covariance matrices and its applications to portfolio allocation 
and risk management are poorly understood, so in this paper, determined efforts are 
made on such an investigation. To make the multi-factor model more realistic, we allow 
K to grow with the number of assets p and hence with the sample size n. As a result, 
we also investigate the impact of the number of factors on the estimation of covariance 
matrices, as well as its applications to portfolio allocation and risk management. To 
appreciate the derived rates of convergence, we compare them with those without using 
the factor structure. One natural candidate is the sample covariance matrix. This also 
allows us to examine the impact of dimensionality on the performance of the sample 
covariance matrix. Our results can also be regarded as an important step to understand 
the performance of factor models with unobservable factors. 

The factor model has been extensively studied in the literature [see, e.g. Scott (1966) 
and (1969), Browne (1987), Browne and Shapiro (1987), and Yuan and Bentler (1997)], 
but traditional work assumes the sample size n tends to infinity while the dimensionality 
p and the number of factors K are fixed. There is a relatively small literature on studies 
of models with a diverging number of parameters. See, for example, Huber (1973), 
Yohai and Maronna (1979), Portnoy (1984, 1985), and Bai (2003). In particular, Fan 
and Peng (2004) establish some asymptotic properties, as well as an oracle property, 
for nonconcave penalized likelihood estimators in the presence of a diverging number of 
parameters. One can further refer to seminal reviews by Donoho (2000) and Fan and Li 
(2006) for challenges of high dimensionality. But it still remains open to examine factor 
models with diverging dimensionality and growing number of factors for the purpose of 
covariance matrix estimation. 

The traditional covariance matrix estimator, the sample covariance matrix, is known 
to be unbiased, and it is invertible when the dimensionality is no larger than the sample 
size. See, for example, Eaton and Tyler (1991, 1994) for the asymptotic spectral distri- 
butions of random matrices including sample covariance matrices and their statistical 
implications. In the absence of prior information about the population covariance ma- 
trix, the sample covariance matrix is certainly a natural candidate in the case of small 
dimensionality, but no longer performs very well for moderate or large dimensionality 
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[see, e.g. Lin and Perlman (1985) and Johnstone (2001)]. Many approaches were pro- 
posed in the literature to construct good covariance matrix estimators. Among them, 
two main directions were taken. One is to remedy the sample covariance matrix and 
construct a better one by using approaches such as shrinkage and the eigen-method, 
etc. See, for example, Ledoit and Wolf (2004) and Stein (1975). The other one is to re- 
duce dimensionality by imposing some structure on the data. Many structures, such as 
sparsity, compound symmetry, and the autoregressive model, are widely used. Various 
approaches were taken to seek a balance between the bias and variance of covariance 
matrix estimators. See, for example, Dempster (1972), Leonard and Hsu (1992), Chiu, 
Leonard and Tsui (1996), Diggle and Verbyla (1998), Pourahmadi (2000), Boik (2002), 
Smith and Kohn (2002), Wong, Carter and Kohn (2003), Wu and Pourahmadi (2003), 
Huang, Liu and Pourahmadi (2004), and Li and Gui (2005). 

1.2. Covariance matrix estimation. We always denote by n the sample size, by p the 
dimensionality, and by f±, ■ ■ ■ , fx the K observable factors, where p grows with sample 
size n and K increases with dimensionality p. For ease of presentation, we rewrite factor 
model (jl.ip in matrix form 

(1.2) y = B n f+e, 

where y = (Yi,--- ,Y p )', B n = (bi,--- ,b p )' with bi = (b n>a ,--- ,b n>iK )', i = l,-" iP, 
f = (/i, • • • , fx)' ■, and e = (ei, • • • , e p )' . Throughout we assume that E(e\f) = and 
cov(e|f) = S ni o is diagonal. For brevity of notation, we suppress the first subscript n in 
some situations where the dependence on n is self-evident. 

Let (fi, yi), • ' " ; (fn, y n ) be n independent and identically distributed (i.i.d.) samples 
of (f, y). We introduce here some notation used throughout the paper. Let 

£ n = cov(y), X = (fi, • • • ,f n ), Y = (y l5 - • • ,y n ) and E = (ei,-- • ,£„). 

Under model (|1.2|) . we have 

(1.3) S n = cov(B n f) + cov(e) = B n cov(f)B' n + E„ l0 . 

A natural idea for estimating S n is to plug in the least-squares estimators of B n , cov(f), 
and Xl nj o- Therefore, we have a substitution estimator 

(1.4) S n = B n cbV(f)B^ + S n! o, 

where B n = YX'(XX') -1 is the matrix of estimated regression coefficients, cov(f) = 
(n — 1) _1 XX' — {n(n — 1)} _1 X11'X / is the sample covariance matrix of the factors f, 
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and 

£ n ,o = diag (n^EE ) 

is the diagonal matrix of n _1 EE with E = Y — BX the matrix of residuals. If the 
factor model is not employed, then we have the sample covariance matrix estimator 

(1.5) S sam = (n - I)" 1 YY' - {n (n - l)}" 1 Yll'Y'. 

This paper mainly provides a theoretical understanding of the factor model with a 
diverging dimensionality and growing number of factors for the purpose of covariance 
matrix estimation; it does not aim to compare with other popular estimators. Through- 
out the paper, we always contrast the performance of the covariance matrix estimator 5] 
in (jl.4|) with that of the sample covariance matrix X sam in (|1-5|) . With prior information 
of the true factor structure, the substitution estimator S is expected to perform better 
than S S am- However, this has not formally been shown, especially when p — > oo and 
K — > oo, and this is not always true. In addition, exact properties of this kind are not 
well understood. As the problem is important for portfolio management, determined 
efforts are devoted in regard to this. Our conclusion can be summarized as follows. 

• X is always invertible, even if p > n, while S sam suffers from the problem of 
possibly being singular when dimensionality p is close to or larger than sample 
size n. 

• The advantage of the factor model lies in the estimation of the inverse of the 
covariance matrix, not the estimation of the covariance matrix itself. When the 
parameters involve the inverse of the covariance matrix, the factor model shows 
substantial gains, whereas when the parameters involved the covariance matrix 
directly, the factor model does not have much advantage. The latter is a surprise 
to the conventional wisdom. 

• Portfolio allocations involve the inverse of the covariance matrix and the factor- 
model based estimates gain substantially, whereas the risk management involves 
directly the covariance matrix and the gain is only marginally. 

• X has asymptotic normality, while in general S sam may not have asymptotic nor- 
mality of the same kind. 

These properties will be demonstrated in our paper as follows. 
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1.3. Outline of the paper. In section 2 we discuss some basic assumptions and present 
the sampling properties of the estimator S, as well as those of S S am- We study the im- 
pacts of the covariance matrix estimation on portfolio allocation and risk management 
in Section 3. A simulation study is presented in Section 4, which augments our theoret- 
ical study. Section 5 contains some concluding remarks. The proofs of our results are 
given in Section 6. All the technical lemmas are relegated to the Appendix. 

2. Sampling properties. In this section we study the sampling properties of I] 
and S S am with growing dimensionality and number of factors. We discuss some basic 
assumptions in Section 2.1. The sampling properties are presented in Section 2.2. 

In the presence of diverging dimensionality, we should carefully choose appropriate 
norms for high dimensional matrices in different situations. We first introduce some 
notation. We always denote by Ai(A), • • • , A ? (A) the q eigenvalues of a q x q symmetric 
matrix A in decreasing order. For any matrix A = (fly), its Frobenius norm is given by 

(2.1) ||A|| = {tr(AA')} 1/2 . 

1 /2 

In particular, if A is a q x q symmetric matrix, then ||A|| = {Yli=i ^i(A-) 2 } • The 
Frobenius norm as well as many other matrix norms [see Horn and Johnson (1985)] is 
intrinsically related to the eigenvalues or singular values of matrices. 

Despite its popularity, the Frobenius norm is not appropriate for understanding the 
performance of the factor-model based estimation of the covariance matrix. To see this, 
let us consider a simple example. Suppose we know ideally that B = 1 and cov (e\f) = I p 
in model (jl.2p with a single factor /. Then we have a substitution covariance matrix 
estimator 'S = lvar(/)l' + I p as in (|1.4|) . It is a classical result that 

E |var(/) - var(/)| 2 = 0(n _1 ). 



Thus by (jl.3p . we have 

S - S = 1 [var(/) - var(/)] l' 

and the Frobenius norm ||S — S|| = |var(/) — var(/)|p picks up and amplifies the 
estimation error from var(/). Consequently, 



E 



2 = 0(n - V)- 



On the other hand, by assuming boundedness of the fourth moments of y across n, a 
routine calculation reveals that 

2 



E 



^sam 



0{n p 
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This shows that under Frobenius norm, S and S sam have the same convergence rate 
and perform roughly the same. Thus we should seek other norms that fully employ 
the factor structure. By assuming the eigenvalues of S are bounded away from and 
var(/) > 0, routine calculations show that 



whereas ||$] _1 / 2 (S — X^S" 1 / 2 !! = Op(n~ 1 / 2 ). Therefore, with prior information of the 
true factor structure, X! performs much better than S S am from this point of view. 

Motivated by the above example, we first fix a sequence of positive definite covariance 
matrices S n of dimensionality p n , n = 1, 2, • • • , and define a new norm 

(2-2) IIAI^^p-^ls-^AS^ 

for any p n x p n matrix A. In particular, we have ||S n || En = p- 1/2 ||/ p || = 1. The 
inclusion of a normalization factor p" 1 / 2 above is not essential and we incorporate it 
to take into account the diverging dimensionality. As seen below, under this new norm 
|| • ||s, the consistency rate in the factor approach is better than that in the sample 
approach. Equivalently, we are investigating convergence rates under the loss function 

1 /2 

(2.3) L(£,£)=p 1/2 s - s = {trpir 1 - J] 2 } . 

The above definition of the norm || • ||s seems a bit artificial and involves the inverse of 
the true covariance matrix, but it is very similar to the entropy loss function proposed 
by James and Stein (1961). See Section 4 for further details. Intrinsically, this norm 
takes into account and fully employs the factor structure. In fact, as shown in the above 
example, the advantage of the factor structure lies in better performance of the inverse 
S — 1 . We will see later in this section that S _1 is a much better estimator of XI -1 than 
^sam> an d this advantage is carried further in portfolio allocation. 

2.1. Some basic assumptions. Let b n = £7||y|| 2 , c n = maxi<j<if E(ff), and d n = 
maxi< i < p £'(e^). 

(A) (fi,yi), • • • , (f„,,y n ) are n i.i.d. samples of (f, y). E{e\¥) = and cov(e|f) = S ni o is 
diagonal. Also, the distribution of f is continuous and K < p. 

The first and second parts are usual conditions, and it is realistic to put K < p. 
The assumption that f has a continuous distribution is made to ensure that the K x K 
matrix XX' is invertible with probability one when n > K. Clearly, the covariance 
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matrix estimator S is positive definite with probability one whenever n > K. By the 
assumption that the K factors capture the cross-sectional risks, the idiosyncratic noises 
are uncorrelated, so S ni o is diagonal. 

(B) b n = 0(p) and the sequences c n and d n are bounded. Also, there exists a constant 
u\ > such that A^-(cov(f)) > o\ for all n. 

This is a technical assumption. In view of -E||y|| 2 = Y^d=i^Vii = 0(p) is a 
reasonable condition. The assumption c n = 0(1) shows that the fourth moments of f 
are bounded across n, which facilitates the study of the sample covariance matrix of f. 
The uniform lower bound imposed on the eigenvalues of cov(f) helps the study of the 
inverse of the sample covariance matrix of f since K — > oo, and it along with b n = 0(p) 
entails that ||B n || = 0(p 1 ^ 2 ). It is evident from our theoretical analysis that Aj<- (cov(f)) 
can be allowed to tend to zero at some rate, which results in slower convergence rates 
of the estimators. But we do not pursue in this direction here. 



(C) There exists a constant 02 > such that A p (5] nj o) > 02 for all 



n. 



This is a reasonable assumption and ensures that all the eigenvalues of 5] n 's are 
bounded away from in view of f)l .3|) . In particular, we have US" 1 !! = 0(p 1 ^ 2 ). Our 
theoretical analysis applies to the case where A p (S n) o) tends to zero at some rate, but 
we do not pursue along this direction for simplicity. 



(D) The K factors - , fx are fixed across n, and p 1 BjjB r 
some K x K symmetric positive semidefinite matrix A. 



A as n — > 00 for 



This assumption is used only to establish asymptotic normality of the estimator S, 
which facilitates statistical inferences. In view of p _1 B^B n = p _1 (bib' 1 + • • • + h p h' p ), 
this assumption is reasonable when K is fixed. 

2.2. Sampling properties. 

Theorem 1 (Rates of convergence under Frobenius norm). Under conditions (A) 



and (B), we have 
addition, we have 



and 



max 

i<fc<p 



£ - S = Opin-^pK) and 

A fc (£ n ) - A fe (S n )| = o P {(p 2 K 2 log n/n) 1 ' 2 } 
\ k (V sam ) - A fc (S n ) = o P {(p 2 K 2 log n/n) 1 ' 2 } 



Op{n~ l / 2 pK). In 



max 

i<fc<p 
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From this theorem, we see that under the Frobenius norm, the dimensionality reduces 

rates of convergence by an order of pK, which is the order of the number of parameters. 
The above rate of eigenvalues of XI is optimal. To see it, let us extend the previous 
example by including K factors , fx and setting B = (l,--- ,l) pX K- Further 

suppose we know ideally that cov(f) = vai{f\)lK- Then we have 

£ n = J p + var(/i)Kll / and S n = I p + ™(fi)Kll'. 

It is easy to see that Xi(S n ) = v&v{f\)pK + 1, Afc(5] n ) = 1, k = 2, ■ ■ ■ ,p and Ai(E n ) = 
var(fi)pK + 1, A fc (S n ) = 1, k = 2, • • • ,p. Thus, 



max 

i<fc<p 



A fr (£ r 



A fr (S r 



|var(/i) - var(/i)|pif = P {n- l ' 2 pK). 



Therefore, SI here attains the optimal uniform weak convergence rate of eigenvalues. 

Theorem 1 shows that the factor structure does not give much advantage in estimat- 
ing S. The next theorem shows that when S _1 is involved, the rate of convergence is 
improved. 



Theorem 2 (Rates of convergence under norm || 
and p = 0(n a ). Under conditions (A)-(C), we have 



min (1 — 2ai, 2 — a — a±) ant 
3qi — a). 



Suppose that K = 0(n ai ) 
£ - £ = Opin-PI 2 ) with (3 = 

Op(n _/3l/2 ) with fa = l-max(a,3ai/2, 



It is easy to show that (5 > fa whenever a > 2at\ and oc\ < 1. Hence, the sample 
covariance matrix S sam has slower convergence. An interesting case is K = 0(1). In 
this case, under the norm || • 5] has convergence rate n"' 3 / 2 with = min(l, 2 — a), 
whereas S sam has slower convergence rate n~^ 1//2 with fa = 1 — a. In particular, when 
a < 1, S is root-n-consistent under || • This can be shown to be optimal by some 
calculations using a specific factor model mentioned above. 



Theorem 3 (Rates of convergence of inverse under Frobenius norm). 
tions (A)-(C), we have 



Under condi- 



whereas 



sam *-'n 



Op 



{{p 2 K A log n/n) 1/2 }, 



o P {{p A K 2 logn/n) 1/2 }. 
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Prom this theorem, we see that when K = o(p), performs much better than 

£^am- As expected, they perform roughly the same in the extreme case where K is 
proportional to p. It is very pleasing that under an additional assumption (C), X -1 has 
a consistency rate slightly slower than £ under the Probenius norm, since £ -1 involves 
the inverse of the K x K sample covariance matrix of f. The consistency result of is 
implied by that of S sam , thanks to a simple inequality in matrix theory on inverses under 
perturbation. However, the consistency result of £ _1 needs a very delicate analysis of 
inverse matrices. This theorem will be used in Section 3.1 to examine the variance of a 
mean-variance optimal portfolio. 

Before going further, we first introduce some standard notation. Let A = (dy) be a 
q x r matrix and denote by vec(A) the qr x 1 vector formed by stacking the r columns 
of A underneath each other in the order from left to right. In particular, for any d x d 
symmetric matrix A, we denote by vech(A) the d(d + l)/2 x 1 vector obtained from 
vec(A) by removing the above-diagonal entries of A. It is not difficult to see that there 
exists a unique d 2 x d(d + l)/2 matrix of zeros and ones such that 

Dd vech(A) = vec(A) 

for any d x d symmetric matrix A. is called the duplication matrix of order d. 
Clearly, for any d x d symmetric matrix A, we have 

Povec(A) = vech(A), 

where Pp = (D'D)^ 1 D' . For any q x r matrix Ai = (dy) and s x t matrix A2, we 
define their Kronecker product Ai <£> A2 as the qs x rt matrix (ay-Aa)- 

Theorem 4 (Asymptotic normality). Under conditions (A), (B), and (D), ifp — > 00 
as n — ► 00, then the estimator X satisfies 



^ vech [p~ 2 B' n (e„ - £„) B n ] M (0, G) , 
where G = Pd (A <g) A) DHD' (A (8) A) P' D , H = cov [vech (U)] with U = (uij) KxK and 

COV (Uij, U ki ) = K ijkl + K ik K jl + K il K jk , 

K ll " 4r is the central moment E [(f h - Ef h ) ■ ■ ■ (f ir - Ef ir )] of f = (/1, • • • , f K )', D is 
the duplication matrix of order K , and Pd = (D'D) 1 D' . 
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When f has a .fT-variate normal distribution with covariance matrix ((Tij)KxKi the 
matrix H in Theorem 4 is determined by 

cov (uij,u M ) = a ik <jji + a a a jk . 

The diverging dimensionality takes care of a trouble term in establishing asymptotic 
normality. However, in the finite dimensional setting, one can only show asymptotic 
normality when f has mean 0, where cov(f) can be estimated as cov(f) = n~ 1 XX', and 
in general, may have no asymptotic normality because the term Xll'X' (XX') 1 X 
may not have a limiting behavior as n — > oo (at least it is not clear now). This is an 
interesting phenomenon in the presence of diverging dimensionality. 

3. Impacts on portfolio allocation and risk management. In this section we 
examine the impacts of covariance matrix estimation on portfolio allocation and risk 
management, respectively. 

3.1. Impact on portfolio allocation. For practical use in portfolio allocation, one would 
expect that the optimal portfolio constructed from the covariance matrix estimated from 
the history should not deviate too much from the true one. So we examine the behavior 
of the optimal portfolio constructed using S estimated from historical data. 

Markowitz (1952) defines the mean-variance optimal portfolio as the solution £ n G W 
to the following minimization problem 

(3.1) min £'£ n £ 

Subject to £'l = 1 and £ ' n n = j n , 

where 1 is a p x 1 vector of ones, /x n = E (y), and 7 n is the expected rate of return 
imposed on the portfolio. It is well known that Markowitz's optimal portfolio [see 
Markowitz (1959), Cochrane (2001), or Campbell, Lo and MacKinlay (1997)] is 

/ q r\\ c _ <Pn~ In^n „_! Jn<Pn ~ Ai ^-l 

with ip n = l'S" 1 !, tp n = l'l]" 1 ^, and (j) n = /z^S" 1 /^, and its variance is 

Denote by $, ng the £ n in (|3.2p with 7„ replaced by ip n /ip n . The global minimum variance 
without constraint on the expected return is 
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which is attained in (|3.3|) when 7„ = ip n /ip n . 

Based on the history, we can construct S n as before. Also, we have a substitution 
estimator fi n = B n n _1 (fi + • • • + f n ) of the mean vector /x n . As above, we can define 
estimators £ n , £ ng and (p n , tjj n , 4> n with S n and [i n replaced by S n and p, n , respectively. 

It is interesting to study the deviation of the constructed optimal portfolio £ n and 
the globally optimal portfolio £ ng from the theoretical ones, say, £ n and $, ng . But here 
we do not pursue in this direction because it is more valuable to study the risk associated 
with them. Therefore, we only examine the behavior of the minimum variance £ n S n £ n 
and global minimum variance ^ ns S„| n3 in this section. 

Theorem 5 (Weak convergence of global minimum variance). Suppose that all the 
(p n 's are bounded away from zero. Under conditions (A)-(C), we have 

tuning ~ tn^ntng = Op^K^Ogn / nfl 2 } , 

whereas 

teaming ~ Zng^ntng = °p{{P & K 2 logn/n) 1 / 2 }. 

Theorem 6 (Weak convergence to optimal portfolio). Suppose that tpn^n — ipn are 
bounded away from zero and tp n / ((p n (j) n - ip 2 ), ip n / {y n <t> n ~ tpl), 4>n/ {Vn4>n ~ V'n); In are 
bounded. Under conditions (A)-(C), we have 

tn^nln ~ C^nUn = P {(p 4 K 4 logn/n) 1 / 2 } , 

whereas 

t n £ sa Un ~ tubulin = P {(p 6 K 2 log n/n) 1 ' 2 }. 

The assumptions on ip n , ip n and 4> n in Theorems 5 and 6 are technical and reasonable. 
In view of (|3.4p . the assumption on (p n in Theorem 5 amounts to saying that the global 
minimum variances are bounded across n. The additional assumptions in Theorem 6 
can be understood in a similar way in light of (|3.3p . From the above two theorems, 
we see that when K = o(p), XI performs much better than S sam from the point of 
view of portfolio allocation. On the other hand, we also see that dimensionality as 
well as number of factors can only grow slowly with sample size so that the globally 
optimal portfolio and the mean-variance optimal portfolio constructed using estimated 
covariance matrix 5] or 5] sam behave similarly to theoretical ones. So high dimensionality 
does impose a great challenge on portfolio allocation. 

Our study reveals that for a large number of stocks, additional structures are needed. 
For example, we may group assets according to sectors and assume that the sector 
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correlations are weak and negligible. Hence, the covariance structure is block diagonal. 
Our factor model approach can be used to estimate the covariance matrix within a block, 
and our results continue to apply. 

3.2. Impact on risk management. Risk management is a different story from portfolio 
allocation. As mentioned in Section 1.1, the smallest and largest eigenvalues of the 
covariance matrix are related to the minimum and maximum variances of the selected 
portfolio, respectively. Throughout this section, we fix a sequence of selected portfolios 
£ n £ MP with £^1 = 1 and £ n = 0(1)1. Here we impose the condition £ n = 0(1)1 to 
avoid extreme short positions - that is, some large negative components in £ n . Then, 
the variance of portfolio £ n is 

var(£^y) = £'„cov(y)£ n = 

The estimated risk associated with portfolio £ n is £^5] n £ n . For practical use in risk man- 
agement, we need to examine the behavior of portfolio variance based on S n estimated 
from historical data. 

Theorem 7 (Weak convergence of variance). Under conditions (A) and (B), we 
have 

H'rPndn ~ = P {{p A K 2 log n/n) 1 ' 2 } 

and 

H' n %amtin ~ C^ntn = o P {(p 4 K 2 log n/n) 1 ' 2 }. 
On the other hand, if the portfolios £ n 's have no short positions, then we have 

Zn^ntn ~ = P {(j?K 2 log n/n) 1 ' 2 } 

and 

i'^samin ~ &En£„ = P {(p 2 K 2 logn/n) 1 ' 2 } . 

From this theorem, we see that S behaves roughly the same as the sample covariance 
matrix estimator Sl S am in risk management. This is essential for both covariance matrix 
estimators, since risk management does not involve inverse of the covariance matrix, but 
the covariance matrix itself. The above theorem is implied by consistency results of 5] 
and 5] sam under the Frobenius norm in Theorem 1. 

4. A simulation study. In this section we use a simulation study to illustrate and 
augment our theoretical results and to verify finite-sample performance of the estimator 
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S as well as S . To this end, we fix sample size n = 756, which is the practical 
sample size of three- year daily financial data, and we let dimensionality p grow from low 
to high and ultimately exceed sample size. As mentioned before, our primary concern 
is a theoretical understanding of factor models with a diverging number of variables 
and factors for the purpose of covariance matrix estimation, but not comparison with 
other popular estimators. So we compare performance of the estimator I] only to that 
of sample covariance matrix S sam . To contrast with S S ami we examine the covariance 
matrix estimation errors of S and S sam under the Frobenius norm, the norm || • ||s 
introduced in Section 2, and the Stein (or entropy) loss function 



which was proposed by James and Stein (1961). Meanwhile, we compare estimation 
errors of and under the Frobenius norm. Furthermore, we evaluate estimated 
variances of optimal portfolios with expected rate of return j n = 10% based on I] 
and Ssam by comparing their mean-squared errors (MSEs). For the estimated global 
minimum variances, we also compare their MSEs. Moveover, we examine MSEs of 
estimated variances of the equally weighted portfolio £ p = (1/p, ■ ■ ■ , 1/p), based on I] 
and Ssamj respectively. 

For simplicity, we fix K = 3 in our simulation and consider the three-factor model 

(4.1) Y pi = b pil f 1 + bpi2f2 + b p i 3 f 3 + ei, i = !,■■■, p. 

Here, we use the first subscript p to stress that the three-factor model varies across 
dimensionality p. As before, we let y = (Yj., • • • , Y p )' and f = (/i, /2, fz)'. The Fama- 
French three- factor model [Fama and French (1993)] is a practical example of model 
(|4.ip . To make our simulation more realistic, we take the parameters from a fit of the 
Fama-French three-factor model. 

In the Fama-French three-factor model, Yi is the excess return of the i-th. stock or 
portfolio, i = 1, • ■ ■ ,p. The first factor /i is the excess return of the proxy of the market 
portfolio, which is the value-weighted return on all NYSE, AMEX and NASDAQ stocks 
(from CRSP) minus the one-month Treasury bill rate (from Ibbotson Associates). The 
other two factors are constructed using six value-weighted portfolios formed on size and 
book-to- market. Specifically, the second factor f2, SMB (Small Minus Big), 




SMB = 1/3 (Small Value + Small Neutral + Small Growth) 
- 1/3 (Big Value + Big Neutral + Big Growth) 
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is the average return on the three small portfolios minus the average return on the three 
big portfolios, and the third factor f^, HML (High Minus Low), 

HML = 1/2 (Small Value + Big Value) 

- 1/2 (Small Growth + Big Growth) 

is the average return on the two value portfolios minus the average return on the two 
growth portfolios. See their website http : / /mba. tuck. dartmouth. edu/pages/f acuity 
/ken. french/data_library . html for more details about their three factors and the 
data sets of the three factors, risk free interest rates, and returns of many constructed 
portfolios. 

We first fit three-factor model (|4.ip with n = 756 and p = 30 using the three-year 
daily data of 30 Industry Portfolios from May 1, 2002 to Aug. 29, 2005, which are avail- 
able at the above website. Then, as in (jl.4p . we get 30 estimated factor loading vectors 
bi = (&ii, &12, &13), • •• ,b3o = (^30,1, &30,2> ^30,3) and 30 estimated standard deviations 
^l) " " " i ^30 °f the errors, where bj and cr^ correspond to the i-th portfolio, % = 1, ■ • • , 30. 
The sample average of a\, ■ ■ ■ , CT30 is 0.66081 with a sample standard deviation 0.3275. 
We report in Table 1 the sample means and sample covariance matrices of f and b 
denoted by /if, /i^ and covf, cov^, respectively. 



Table 1 

Sample means and sample covariance matrices of f and b 



H 


COVf 


0.023558 


1.2507 


-0.034999 


-0.20419 


0.012989 


-0.034999 


0.31564 


-0.0022526 


0.020714 


-0.20419 


-0.0022526 


0.19303 


^b 


cov b 


0.78282 


0.029145 


0.023873 


0.010184 


0.51803 


0.023873 


0.053951 


-0.006967 


0.41003 


0.010184 


-0.006967 


0.086856 



For each simulation, we carry out the following steps: 

• We first generate a random sample of f = (/1, /2, fs)' with size n = 756 from the 
trivariate normal distribution N (/Uf, covf) . 
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• Then, for each dimensionality p increasing from 16 to 1000 with increment 20, we 
do the following. 

• Generate p factor loading vectors b random sample of size p from 
the trivariate normal distribution J\f Utu, cov b) • 

• Generate p standard deviations crx,--- ,cr p of the errors as a random sample of 
size p from a gamma distribution G(a, (3) conditional on being bounded below by 
a threshold value. The threshold for the standard deviations of errors is required 
in accordance with condition (C) in Section 2.1, and it is set to 0.1950 in our 
simulation because we find mini<j<3o(Xi = 0.1950. Note that G(a,f3) has mean 
a/3 and standard deviation a 1 / 2 /?, and its conditional mean and conditional second 
moment on falling above 0.1950 can be approximated respectively by 



where p is the probability of falling below 0.1950 under G(a, (3). By matching the 
mean 0.66081 and standard deviation 0.3275 for G(ao,/?o)i we obtain «o = 4.0713 
and /?o = 0.1623. Therefore, following the above approximations, by recursively 
matching the conditional mean 0.66081 and conditional second moment 0.3275 2 + 
0.66081 2 = 0.54393 for G(a,(3), we finally get a = 3.3586 and [3 = 0.1876. 

• After getting p standard deviations o*i, • • • , cr p of the errors, we generate a random 
sample of e = (ei, • • • , e p )' with size n = 756 from the p-variate normal distribution 



AA(0,diag(cr 2 ,--- ,ct 2 )). 

• Then from model (|4.ip . we get a random sample of y = (Yi, • • • ,Y P )' with size 
n = 756. 



• Finally, we compute estimated covaxicincG matrices XI and S sam , as well as S 
and 5]^^, and record the errors in the aforementioned measures. Meanwhile, we 
calculate MSEs of estimated variances of the optimal portfolios with j n = 10% 
as well as MSEs of estimated global minimum variances based on S and 5] sam , 
respectively. Also, we record MSEs of estimated variances of the equally weighted 
portfolio based on I] and X! sam , respectively. 

We repeat the above simulation 500 times and report the mean-square errors as well as 
the standard deviations of those errors. 




and 
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(e) (f) 

Figure 1; (a), (c) and (e): The averages of errors over 500 simulations for S (solid curve) and S sam (dashed 
curve) against p under Frobenius norm, norm || • and entropy losses, respectively, (b), (d) and (f): Corre- 
sponding standard deviations of errors over 500 simulations for £ (solid curve) and Ssam (dashed curve). 
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(a) 



(b) 



Figure 2: (a) The averages of errors under Frobenius norm over 500 simulations for X 1 (solid curve) and 
Ssam (dashed curve) against p. (b) Corresponding standard deviations of errors under Frobenius norm. 



In Figures 1-4, solid curves and dashed curves correspond to XI and S sani , respec- 
tively. Figure 1 presents the averages and the standard deviations of their estimation 
errors under the Frobenius norm, norm || • and entropy loss against dimensionality 
p, respectively. Figure 2 depicts the averages and the standard deviations of estimation 
errors of XI -1 and under the Frobenius norm against p. We report in Figure 3 

MSEs of estimated variances of the optimal portfolios with 7 n = 10% as well as MSEs 
of estimated global minimum variances using £ and £ S am against p. Figure 4 presents 
MSEs of estimated variances of the equally weighted portfolio based on X and X) sam 
against p. 

Recall that both the sample size n and the number of factors K are kept fixed across 
p in our simulation. From Figures 1-4, we observe the following: 

• By comparing corresponding averages and standard deviations of the errors shown 
in Figures 1 and 2, we see that the Monte-Carlo errors are negligible. 

• Figure 1(a) shows that under the Frobenius norm, S performs roughly the same 
as (slightly better than) S sam , which is consistent with the results in Theorem 1. 
Nevertheless, this is a surprise and is against the conventional wisdom. 

• Figure 1(c) reveals that under norm || • ^ performs much better than S sam , 
which is consistent with the results in Theorem 2. In particular, we see that the 
estimation errors of S under norm || • ||s are roughly at the same level across p. 
Recall that sample size n is fixed as 756 here. Thus, this is in line with the root- 
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n-consistency of S under norm || • ||s when p = 0(n) shown in Theorem 2. Also, 
the apparent growth pattern of estimation errors in S sam with p is in accordance 
with its (n/p) 1//2 -consistency under norm || • ||s shown in Theorem 2. 

• Figure 1(e) shows that under entropy loss, 5] significantly outperforms 5] sam , which 
strongly supports the factor-model based estimator S over the sample one X sam . 
We only report the results for p truncated at 400. This is because for larger 
p, sample covariance matrices 5] S am are nearly singular with a big chance in the 
simulation, which results in extremely large entropy losses. 

• From Figure 2(a), we see that under the Frobenius norm, the estimator 
significantly outperforms S^^, which is in line with the results in Theorem 3. 

• Figures 3(a) and 3(b) demonstrate convincingly that S outperforms S sam in port- 
folio allocation. These results are in accordance with Theorems 5 and 6. One may 
notice that in Figure 3(a), the MSEs are relatively large in magnitude for small 
p and then tend to stabilize when p grows large. This is because in our settings 
for the simulation, for small p the term ip n (ft n — ^ is relatively small compared 
to (fnln ~ 2^n7n + ^n, which results in large variance of the optimal portfolio. 
The behavior of the MSEs for large p is essentially due to self- averaging in the 
dimensionality. Figures 3(b) can be interpreted in the same way. 

• Figure 4 reveals that the factor-model based approach and the sample approach 
have almost the same performance in risk management, which is consistent with 
Theorem 7. The high-dimensionality behavior is essentially due to self-averaging 
as in Figure 3(a). 

5. Concluding remarks. This paper investigates the impact of dimensionality on 
the estimation of covariance matrices. Two estimators are singled out for studies and 
comparisons: the sample covariance matrix and the factor-model based estimate. The 
inverse of the covariance matrix takes advantage of the factor structure and hence can 
be better estimated in the factor approach. As a result, when the parameters involve the 
inverse of the population covariance, substantial gain can be made. On the other hand, 
the covariance matrix itself does not take much advantage of the factor structure, and 
hence its estimate can not be improved much in the factor approach. This is somewhat 
surprising and is against the conventional wisdom. 
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Figure 3: (a) The MSEs of estimated variances of the optimal portfolios with ■yn = 10% over 500 simulations 
based on X (solid curve) and S sam (dashed curve) against p. (b) The MSEs of estimated global minimum 
variances over 500 simulations based on S (solid curve) and S sam (dashed curve) against p. 




Figure 4: The MSEs of estimated variances of the equally weighted portfolio over 500 simulations based on X 
(solid curve) and S sam (dashed curve) against p. 
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Optimal portfolio allocation and minimum variance portfolio involve the inverse of 
the covariance matrix. Hence, it is advantageous to employ the factor structure in 
portfolio allocation. On the other hand, intrinsically the risk management does not 
depend on the covariance structure and hence there is no advantage to appeal to the 
factor model in risk management. 

Our conclusion is also verified by an extensive simulation study, in which the param- 
eters are taken in a neighborhood that is close to the reality. The choice of parameters 
relies on a fit to the famous Fama-French three-factor model to the portfolios traded in 
the market. 

Our studies also reveal that the impact of dimensionality on the estimation of co- 
variance matrices is severe. This should be taken into consideration in practical imple- 
mentations. 

6. Proofs of theorems. In this section, we give rigorous proofs of Theorems 1-7. 

PROOF of Theorem 1. (1) First, we prove {pK)~ l n 1 / 2 -consistency of S under 
the Frobenius norm. To facilitate the presentation, we introduce here some notation 
used throughout the rest of the paper. Let C n = EX'(XX') -1 , 

D n = |(n - l)" 1 XX' - [n(n - l)]" 1 Xll'X' J - cov(f) 

and 

F n = I p o n^E (J n - H) E' - £ , 

where H = X' (XX')" 1 X is the nxn hat matrix and Ai o A2 stands for the Hadamard 
product, i.e. the entrywise product, for any q x r matrices Ai and A2. Then we 
have B = YX' (XX')" 1 = B + C„, cov(f) = (n - l)" 1 XX' - {n (n - l)}" 1 Xll'X' = 
cov(f) + D n , £ = diag ( n^EE ) = S + F n and 



(6.1) £ = £ + BD n B' + [Bcov(f)C' n + C n cov(f)B'] + C n cbV(f)C; + F n , 

This shows that 5] is a four-term perturbation of the population covariance matrix, 
and this representation is our key technical tool. By the Cauchy-Schwarz inequality, it 
follows from (16.11) that 



EWE - Ell 2 < 4 



E tr|(BD n B') 2 } + S tr|[BcbV(f)C; + C n cOT(f)B'] 2 } 
+ E tr { [C n c5v(f)Cy 2 } + E tr (F 2 ) 
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We will examine each of the above four terms on the right hand side separately. For 
brevity of notation, we suppress the first subscript n in some situations where the 
dependence on n is self-evident. 

Before going further, let us bound ||B n ||. From assumption (B), we know that 
cov(f) > ctiIk, where for any symmetric positive semidefinite matrices Ai and A2, 
A-i > A2 means Ai — A2 is positive semidefinite. Thus it follows easily from (|1.3p that 

aiB n B' n = B n (ajK) B' n < B n cov(f)B; < S n , 

which along with b n = 0(p) in assumption (B) shows that ||B n || 2 = tr (B n B^) < 
tr(S n )/ < 7i<^ = 0(p) J i.e. 

(6.2) ||B n || = CKp 1 / 2 ). 

Clearly, ||B^B n || = ||B n B^||, and by (lA.ip in Lemma 1 and (|6.2j) we have 
(6.3) 



l B n B n| 



|B n B'J| < IIBJHBJJI = ||B n || 2 = 0(p). 



This fact is a key observation that will be used very often, and as shown above, it is 
entailed only by assumptions (A) and (B), which are valid throughout the paper. 

Now we consider the first term, say E tr{(BD n B') 2 }. From c n = 0(1) in assumption 
(B), we see that the fourth moments of f are bounded across n, thus a routine calculation 
reveals that 



(6.4) 



E(\\n n \\ 2 )=0(n- 1 K 2 ), 



which is an important fact that will be used very often and also helps study the inverse 
cov^f)" 1 by keeping in mind that K — > 00. By (|A.2p in Lemma 1, (|6,3p . and (|6.4p . we 
have 



(6.5) 



E tr 



BD n B') 2 j < ||B'B|| 2 £(||D n || 2 ) =0(n' 1 (pK) 2 ). 



The remaining three terms are taken care of by Lemmas 2 and 3. Therefore, in view 
of (|6.3p . combining (16. 5|) with (jA.5j) (jA.7h in Lemmas 2 and 3 gives 

2 



E 



In particular, this implies that 



£ - £ 



S - £ 



0{n~ l (pKf). 



Op(n 1 l 2 pK), which proves (pK) 1 n 



1 n 1 / 2 - 



consistency of the covariance matrix estimator S under Frobenius norm. 
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(2) Then, we show that 53 sam is (pK) 1 re 1//2 -consistent under the Frobenius norm. 
By (USD and (JO}, we have 



(6.6) 



53 sam = 53 + BD„B' + G n + (n - l)' 1 {BXE' + EX'B} 
- [n (n - {BXll'E' + Ell'X'B'} , 



where G n = |(n - EE' - [n(n - l)]" 1 Ell'E'}-£ - This shows that 53 sam is also 
a four-term perturbation of the population covariance matrix. By the Cauchy-Schwarz 
inequality, it follows from (|6.6p that 



E 



< 4 



E ||BD n B'|| 2 + E ||G n || 2 + 2 (n - I)' 2 E ||BXE'| 
+ 2[n(n- 1)]~ 2 J5||BX11 / E / || 



As in part (1), we will examine each of the above four terms on the right hand side 
separately. The first term -EHBDnB'H 2 has been bounded in (|6.5p . Using the same 
argument as in Lemma 6, we can show that E ||G n || 2 = 0(n~ 1 p 2 ). In view of (|6.3p . it 
is shown that 

E ||BXE'|| 2 = 0(np 2 K) 

in the proof of Lemma 2. Using the same argument as in Lemma 2 to bound E ||BX11'HE'|| 2 , 
we can easily get 

E llBXll'E'll 2 = 0(n 3 p 2 K), 
which along with (|6.5p and the above results yields 



E 



0{rC 1 {pKf 



This proves (pK) 1 n 1 / 2 -consistency of 53 sam under the Frobenius norm. 

(3) Finally, we prove the uniform weak convergence of eigenvalues. It follows from 
Corollary 6.3.8 of Horn and Johnson (1985) that 



max 

i<fc<p 



Afc(53 n ) — Afc(S n ) < < [^fe(53 n ) — Afc(S n ) 







r <- 


53n — 53n 



,fe=l 



Therefore, the uniform weak convergence of the eigenvalues of the 53 n 's follows imme- 
diately from the {pK)~ l n 1 / 2 -consistency of 53 under the Frobenius norm shown in part 
(1). Similarly, by the (pK)" 1 n 1 / 2 -consistency of 53 sam under the Frobenius norm shown 
in part (2), the ScLm6 conclusion holds for S sam . □ 
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PROOF of Theorem 2. (1) First, we show that S is n^-consistent under norm 
|| • ||s- The main idea of the proof is similar to that of Theorem 1, but the proof is more 
tricky and involved here since the norm || • ||s involves the inverse of the covariance 
matrix S. By the Cauchy-Schwarz inequality, it follows from (|6.ip that 



E 



< 4 



E ||BD n B'||i + E ||Bcov(f)C' n + C n cov(f)B'||!, 



+ £||C n c6v(i3C / n ||^} + £||F n ||! 



As in the proof of Theorem 1, we will study each of the above four terms on the right 
hand side separately. 

Before going further, let us bound ||B'E _1 B||. From (|1.3p . we know that S = 
So + Bcov(f)B', which along with the Sherman-Morrison- Woodbury formula shows that 

(6.7) 5T 1 = E^ 1 - Sq X B [cov(f)' 1 + B'S X B] _1 B'E^ 1 . 

Thus it follows that 

B'E X B = B'Sq X B - B'Sq X B [cov(f)- 1 + B'Sq X B] _1 B'E^B 
= B'Sq X B [cov(f)- 1 + B'Eq X B] _1 cov(f)- 1 

^(f)- 1 - cov(f)^ 1 [cov(f)- 1 + B'Sq x B] _1 cov(f)- 1 , 



cov 



which implies that 

||B / S _1 B|| < ||cov(f)~ 1 || + cov(f)" 1 [cov(f)- 1 + B'S ( 7 1 B] _1 cov(f)- 1 

Note that cov(f) -1 is symmetric positive definite and B'Sq 1 B is symmetric positive 

semidefinite. Thus, cov (f) _1 +B'Eo 1 B > cov(f)" 1 , which in turn implies that [cov(f)" 1 + B'E X B] < 

cov(f) and 

cov(f)- 1 [cov(f)- 1 + B'S 1 B]~ 1 cov(f)- 1 < cov(f)- 1 cov(f)cov(f)- 1 = cov(f)- 1 . 
In particular, this entails that 

cov(f)- 1 [cov(f)- 1 + B'So 1 B] _1 cov(f)~ 1 < ||cav(f) -1 || , 



so now the problem of bounding ||B / E~ 1 B|| reduces to bounding ||cov(f)~ 1 ||. By as- 
sumption (B), (cov (f)) > o\ for some constant o\ > 0. Thus the largest eigenvalues 
of cov(f) -1 are bounded across n, which easily implies that ||cov(f)~ 1 || = 0{K 1 / 2 ). This 
together with the above results shows that 



(6i 



|b'e -1 b|| = 0{K 1/2 ). 



24 



Now we are ready to examine the first term, say E ||BD n B'||^. By (jA.ip in Lemma 
1, we have 



< 



V 



IDJI 2 llB's^Blr. 



||BD n B'||^ = p _1 tr (D n B'S _1 B)' 
Therefore, it follows from (|6.4p and (|6.8|) that 

(6.9) £||BD n B'||^ = 0(n~ 1 p~ 1 K 3 ). 
Then, we consider the second term E ||Bc6v(f)C^ + C n cov 

(f)B'||s. Note that 

(6.10) £||B5&V(f)C^ + C n 5oV(f)B'||^ < 2 \e ||Bcov(f)C^||^ + E ||C n cov(f)B'||| 

= 4 E ||Bcov(f)C;||| < 8[ (n - l)" 2 E ||BXX'C;||^ 
+ n" 2 (n- 1) _2 J B||BX11 / X'C^||| 

= 8 (n - 1)~ 2 £i + 8n" 2 (n - 1)~ 2 £ 2 - 

Since E(e\f) = 0, conditioning on X gives 

d = p~ x E tr [XE (E'S _1 E|X) X'B'E^B] 
= p^E tr [X tr (E _1 S ) I„ X'B'E _:L B] 
< p-Hr (S _1 S ) £ (IIXX'H) llB'S^Bll . 

In the proof of Lemma 2, it is shown that E (||XX'|| 2 ) = 0(n 2 K 2 ), which implies that 

£(||XX'||) < [E(\\XX'f)] 1/2 = 0(nK). 

By (|1.3|) and assumptions (B) and (C), we can easily get 

tr^- 1 ^) < tr (XT 1 ) 0(1) = 0(p), 

which along with (|6.8p and the above results shows that 

d = 0{nK 3 ' 2 ). 

Similarly, by conditioning on X we have 

C 2 = p~ l E tr [Xll'HE 1 (E'E _1 E|X) Hll'X'B'E^B] 
= p~ l E tr [Xll'H tr (S _1 S ) /„ H11 , X'B , S -1 B] . 
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Then, applying (|A.ip - (|A.3|) in Lemma 1 gives 

C 2 < p~hv (E -1 £ ) E llxii'Hii'x'H ||B'S _1 B|| 

< p _1 tr (£ _1 £ ) E ||H|| ||X'X|| || ll'll'H Hb'E^BH 
= r?p- l K l/2 ti (£ _1 £ ) E ||X'X|| ||B'S _1 B|| , 

which together with the above results shows that 

C 2 = 0{n 3 K 2 ). 

Thus, in view of (j6.10j) we have 

(6.11) E ||B56v(f)C; + C n c6V(f)B'||| = 0{n~ l K 2 ). 

The third and fourth terms are examined in Lemmas 4 and 5, respectively. Since 
K < p by assumption (A) , combining (|6.9p and (|6.1ip with (|A.8P and (jA.lip in Lemmas 
4 and 5 results in 



E 



£ - £ 



0{n- l K 2 ) +0{n-' z pK). 



In particular, when X = 0(n ai ) and p = 0(n a ) for some < a\ < 1/2 and < a < 
2 — qi, we have 

£ - £ = Opfn"^ 2 ) 
s 

with /? = min (1 — 2ai, 2 — a — ai), which proves n^/ 2 -consistency of covariance matrix 
estimator £ under norm || • 

(2) Then, we prove the n^ 1 ^-consistency of S sam under norm || • By the Cauchy- 
Schwarz inequality, it follows from (|6,6p that 



E 



< 4 



E llBD^B'll^ + E ||G„||| + 2 (n - 1)~ 2 £ llBXE'l 



+ 2 [n (n - 1) 



£ BXll'E' 



As in part (1), we will examine each of the above four terms on the right hand side 
separately. The first term E \\BY) n 'B'\\ 2 ^ has been bounded in (|6.9p . and the second 
term E 1 1 G. n 1 1 ^ is considered in Lemma 6. The third term E ||BXE'|||j is exactly C\ in 
part (1) above. Using the same argument that was used in part (1) to prove C 2 , we can 
easily get 



E BXll'E' 



0(n 3 K 3 / 2 ). 



Thus, by (|6.9p and (|A.12p in Lemma 6 along with the above results, we have 



E 



^sam £ 



0(n~ 1 p~ 1 K 3 ) + 0{n~ l p) + O^K 312 ). 
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In particular, when K = 0(n ai ) and p = 0{n a ) for some < a < 1 and < a\ < 
(1 + a) /3, we have 

£ sam - £ = P (n~M 2 ) 

£ 

with (3\ = 1 — max(a, 3a\/2, 3a\ — a), which shows n^ 1 ^-consistency of S sam under 
norm || • □ 

PROOF of Theorem 3. (1) First, we prove the weak convergence of under 
the Frobenius norm. Note that £ S am involves sample covariance matrix estimation of 
So, so the technique in part (2) below does not help. In general, the only available way 
is as follows. We define Q n = S sam — S n . It is a basic fact in matrix theory that 



(6.12) 



V-l _ y-l 



< s; 



< 



l^ni llQr 



1— ||S n 1 Q n || 1 — H^n 1 ]! ||Q n || 

whenever US" 1 ]! ||Q n || < 1. From Theorem 1, we know that 

||QJ|= Opin-^pK). 
By (|X9|1 . we have = 0(p 1/2 ). Since pK 1 / 2 = o((n/ log n) 1 / 4 ) we see that 

ll^n 1 !! IIQnll -^0 and y/np-*K- 2 /logn ||Q„|| 0. 



It follows easily that 



y / np~ i K~ 2 /log 



H^n 1 !! IIQnll 
1- llSn 1 !! HQr 



which along with (|6.12p shows that 



y / np~ 4 -fC~ 2 /log 



n 



y 1-1 — y 



-1 



as n — > oo. 



(2) Then, we show the weak convergence of £ _1 under the Frobenius norm. The 
basic idea is to examine the estimation error for each term of S _1 , which has an explicit 
form thanks to the factor structure. From (|1.4p . we know that £ = Bcov(f)B + £o, 
which along with the Sherman-Morrison- Woodbury formula shows that 



(6.13) 



s- 1 



coV(f)- 1 + B'S^B 



BE 



o • 
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Thus by (|6.7p . we have 



(6.14) 



S^-S" 1 



< 



fi-l y,-l 



+ 



cov(f)- 1 + B S^B 



-i 



B s: 



+ 



+ 



+ 



+ 



cov(f) -1 + B Eg X B 



B (Sq 1 - Eq 1 



B -B 



cw(f)" 1 +BS " 1 B 



b s; 



S^B 



cav(f) -1 + b'Sq 1 b1 (b-B')Sq 1 



cov(f)- 1 + b'Eq 1 b1 1 - [cov(f)" 1 + B'S^B] 1 1 B'Sg 1 



6- 



To study S" 1 — S" 1 , we need to examine each of the above six terms fC±, ■ ■ ■ ,K.% 
separately, so it would be lengthy work to check all the details here. Therefore, we only 
sketch the idea of the proof and leave the details to the reader. 

From assumption (C), we know that the diagonal entries of So are bounded away 
from 0. Note that So and So are both diagonal, and thus, by the same argument as in 
Lemma 5, we can easily show that 



(6.15) JCi 



Op^/y/ 2 ) +0 P (n~ 1 pK 1 / 2 ) = P (n- x /V /2 ) 



since pK 1 ! 2 = o((n/ log n) 1 / 2 ). Now we consider the second term /C2. By (jA.ip in 
Lemma 1, we have 



£2 < 



So 1/2 B 



cw(f)- 1 +B So 1 ^ 



B S 



-1/2 



-1/2 



-1/2 



. Since 



and we will examine each of the above two terms C± and £2, as well as 
So and So are diagonal, a similar argument to that bounding fC\ above applies to show 
that 



-1/2 



P (p 1/2 ) and C 1 = Op{n~ 1 / 2 p 1 / 2 ). 



Clearly, S 1/2 B 



cov(f)- 1 + b'Sq 1 !? 

U/2^_iA1/2 



B S, 



-1/2. 



is symmetric positive semidefinite with 



rank at most and S Q S S Q > 0. Thus it follows from (|6.13p that 



S- 1/2 B 



coV(f)" 1 +B / S " 1 B 



~/--l/2 _ T ai/2A_i£;l/2 



b s; 



ip — Sq S Sq < Ip, 
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1/2; 



which implies that E B 



_i/2 

B S has at most K positive 



cOT(f)- 1 + B£ ~ 1 B 

eigenvalues and all of them are bounded by one. This shows that £2 < K 1 ^ 2 , which 
along with the above results gives 



(6.16) K 2 = P {n~ l / 2 pK l l 2 ). 
Similarly, we can also show that 

(6.17) £3 = P (n- 1/2 pK x / 2 ). 

Then we consider terms /C4 and /C5. Clearly, cov (f)" 1 + B S X B > cw(f)- 1 , which 
in turn entails that cov (f)" 1 + B S^B < cov(f) and 



cov(f) _1 + b'Sq 1 ^ 



< ||2ov(f)||. 



It is easy to show that ||cov(f)|| = Op(K). Thus we have 



(6.18) 

and 
(6.19) 



Ka < 



B-B 



Op^p^Op^Opip 1 ' 2 ) = Op(n~ l / 2 pK) 



cav(f) -1 +b'£q 1 B 



be; 



1/21 



-1/2, 



1Cz < IISn'Bl 



B -B'l E 



coV(f)" 1 +B E^B 
= Op{p 1/2 )0 P {K)Op{ n - l p l/2 K) = P (n- 1/2 pK). 
Finally, by the same argument as in part (1) above, we can show that 



-1 



cOT(f)- 1 + B Eg X B - [cov(f)" 1 + B'E^B 



Thus by ()A.2|) in Lemma 1, we have 



l-Dl" 1 



op((n/\ogny 1/2 K 2 ) 



(6.20) /C 6 < 



cbV(f)" 1 + b'Sq 1 b1 - [cov(f)- 1 + B'E^B] 1 
= op({n/ log n) -1 / 2 K 2 )0{p) = op({n/ log n)" 1 ^ 2 pK 2 ). 
Therefore, it follows from (IBTHll - fllHOj) that 



|B'S~ 2 B| 



\Jnp~ 2 K~ A / log 
which completes the proof. □ 



r? 



as n — > 00, 
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Proof of Theorem 4. We aim at establishing asymptotic normality of the KxK 
matrix y / np~ 2 B / (ll — B, and only here are the K factors /i, • • • , fx assumed fixed 
across n. The basic idea is to use its four-term decomposition below and to show that 
the first term has asymptotic normality by the classical central limit theorem, while the 
remaining three terms are all negligible, say op(l), which along with Slutsky's theorem 
leads to the desired conclusion. In view of (I6.ll) . we have 



^p~ 2 B' (£ - SJ B = Vnp^B'BD^B'B + ^p~ 2 B' {Bcw(f)C; + C n cov(f)B'} B 

+ v^~ 2 B'C n COT(f)C;B + y / np _2 B'F n B 

(6.21) =^1+^2 + ^3+^4- 

We will study each of the above four terms A\ , • • • , A^ separately. 
First, we consider the term A\. Define 



Then we have 

n 

(6.22) cov(f) = (n - l)" 1 ^ (£j - M) $ - £f ) - W n . 

i=l 

By the classical central limit theorem, we know that 

( n" 1 £fj - M J N (0, cov(f)) . 



i=l 



It follows from the law of large numbers that n 1 ^27=1 ^ ~ E ^ ~^ 0- Thus, by Slutsky's 
theorem we have sjnht n —> 0, which in turn implies that 

\fnH n -^-> 0; 

that is, 7i n = op{n~ 1 / 2 ). So in view of (|6.22p . we have 

n 

(6.23) cov(f) = n" 1 ft ~ Er ) (& ~ E ?) + op(n~ 1/2 ). 

i=l 

Therefore, it follows easily from p _1 B^B n — ► A and (|6.23|) that 

(6.24) Ai = A jn" 1 / 2 J2 ~ Ei ) (3 " Ef ) ~ cov ( f )] | A + °p(X)- 
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We define 

n 

n ~ 1/2 E [ft - M ) - Ef - cov ( f )] = ^ = K-k**- 

i=l 

By the classical central limit theorem, we know that [see, e.g. Muirhead (1982)] 

(6.25) vech (U n ) M (0, H) , 
where is determined in an obvious way by 

COV (uij,U k l) = K ijkl + K ik K jl + K U K jk , 

with the central moment E [(f h -Ef h )--- (f ir - Ef ir )\ of f = (/i, • • • , f K )'. It 

follows easily from (|6.24p and f)6.25j) that 

(6.26) vech(^i) JV (o, G) , 

where G = Pu (A <£> A) DHD' (A <£> A) P' D , D is the duplication matrix of order K, and 
P D = (D'D)~ l D' . 

Then, we examine the second term A%. From p _1 B^B n — > A, we know that 
(6-27) ||B;B n || = ||B n B;|| =0(p), 

which is in line with (16.31). It follows that 



(6.28) P2II < 2 ||>/nip~ 2 B'Bc5v(£)C^B|| < 2n 1/2 p~ 2 ||B'B|| ||cov(f)C^B|| 

< 2n 1/2 p~ 2 ||B'B|| { (n - l)" 1 ||XE'B|| + n" 1 (n - l)" 1 ||Xll'HE'B 
= 0{n~ l/2 p~ l ) ||XE'B|| + 0(n~ 3/2 p -1 ) ||Xll'HE'B|| . 

Since £7(e|f) = and So is diagonal, conditioning on X gives 

E ||XE'B|| 2 = E tr [XE (E'BB'E|X) X'] = E tr [X tr (BB'S ) I„ X'] 
= tr (BB'So) .5 ||X|| 2 = 0(p)0(n) = 0(np). 

Similarly, by conditioning on X we have 

E ||Xll'HE'B|| 2 = E trfxil'HE (E'BB'E'|X) Hll'X' 
= E tr [Xll'H tr (BB'Eq) J„ Hll'X'] 
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and then applying (|A.2|) and (|A.3P in Lemma 1 yields 



E HXll'He'Bll 2 < tr (BB'S ) ||X'X|| ||ll'll'|| ||H|| } 



< 0(p)n 2 K^ 2 {E(||X'X|| 2 )} 



2\ll/2 



0(n 3 p). 



It follows that ||XE'B|| = P {n l l 2 p l l 2 ) and HXll'HE'BH = P (n 3 /V /2 )> w hi c h to- 
gether with (|6.28p shows that 



(6.29) 



A 2 = o P (l); 



that is, Ai is a negligible term. 

Finally, the third and fourth terms A3 and A4 can also be shown to be negligible by 
invoking Lemma 3. By (|6.27p and (|A.6|) and (|A.7[) in Lemma 3, we have 

^llB'C^covf^C^Bll 2 < ||BB'|| 2 £;||C n cov(f)C^|| 2 



0(^)0(n~V) = 0(n"V) 



and 



E ||B'F n B|| 2 < ||BB'|| 2 E ||F n || 2 = 0{p 2 )0{n~ l p) = 0(n -1 p 3 )- 

It follows that ||B'C n cov(f)C^B|| = Op(n - V) and ||B'F n B|| = Op(n" 1 /V /2 ), which 
implies that 

(6.30) .A3 = op(l) and A 4 = o P (l). 

Therefore, in view of (|6.26|> . (|6.29p . and (|6.30j) . applying Slutsky's theorem gives 



n vech 



p~ 2 B' n ( S n - S n ) B r 



D 



AA(0,G), 



which proves the asymptotic normality of covariance matrix estimator S. □ 

Proof of Theorem 5. (1) First, we prove the weak convergence of the estimated 
global minimum variance based on X. From Theorem 3, we know that 



y/np- 2 K- 4 / log 



s- 1 



Note that 



I - <Pr, 



l'fs^-s- 1 ] 1 



< 



1-1 



lll'l 



0. 



s^-s- 1 ] 11' 



32 



Thus we have 



I —4 P 

\Jn(pK) /logn \<p n - ip n \ — >0. 
Since all the <£> n 's are bounded away from zero, it follows easily that 

l n {pK)~ A /logn tng^ning ~ Cg^nZng = \J n (pK)~* / log n - if" 1 ] 0. 

(2) Then, we prove the conclusion for S sam . From Theorem 3, we know that 

p 



\Jnp 4 K 2 /lo£ 



-1 



0. 



Therefore, the above argument in part (1) applies to show that 



»/!(/ 



Vnp 6 K 2 /logn £ ng £ sam £ n(7 - £' ng ^ n £na = V n P 6R 2 /k>gn \(p n x - tp n l \ — ► 0. □ 



11 P 



Proof of Theorem 6. (1) First, we prove the weak convergence of the estimated 
variance of the optimal portfolio based on S. From Theorem 3, we know that 

(6.31) ^np^K-i/logn XP 1 - XT 1 0, 
and from part (1) in the proof of Theorem 5, we see that 

/ —4 P 

(6.32) Jn(pK) /logn \(p n - ip n \ — ► 0. 



Now we show the same rate for 
(6.33) 



, say 



n (pK) I log n 



0. 



By b n = 0(p) in assumption (B), a routine calculation yields \\n n \\ = 0(p 1 ^ 2 ) and 
E ||/£ n — /x n || 2 = 0(n _1 p), and thus 



IIAn - Vn\\ = P( n 1/2 P 



1/2 1/2n 



It follows that 



A ~ A 



Then we have 

A ~ A 



< 



IMS 



A* 



+ ll'S- 1 (/2- /*)| < ||1'| 



<pV2 



S^-E- 1 [0(p 1 /2) + 0p(n -V2 :p l/2 : 



+ p 1/2 0(p 1/2 )0 P (n~ i/ ^ 



1/2„1/2n 



0(p) + P (?i- 1 / 2 p 



1/2 3/2n 



o(p), 
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which together with (|6.3ip proves (|6.32[) . Similarly, we can also show that 
(6.34) (pKy 4 / log n - n | 0. 

Since v? n </> n — V'n are bounded away from zero and (pn/ifn&n — ip 2 ), i^n/ifn^n — V'n)' 
<f>n/('Pn < f ) n — ^n)j 7« are bounded, the conclusion follows from (|3.3j) and (|6.32p - (|6.34p . 

(2) Now we prove the conclusion for S S am- From Theorem 3, we know that 



\J np 4 K 2 /logn 

and from part (2) in the proof of Theorem 5, we see that 

/ p 

V np~ 6 K~ 2 / log n \(p n -<Pn\ — 



0. 



0. 



Since b n = 0{p) by assumption (B), a routine calculation shows that 

Psam - A»nll = P (n~ 1/2 p 1/2 ), 

where £t sam is the sample mean of /i n . Therefore, the argument in part (1) above applies 
to show that 



y/np- e K-yiogn ?ls sam ? ra - &E ft & 



as n — > oo. □ 



PROOF of Theorem 7. Since £ n = 0(1)1, the conclusion follows easily from 
consistency results of E and E sam under the Frobenius norm in Theorem 1. In particular, 
when the portfolios £ n = (£i, • • • , £ p )' have no short positions, we have 



\\u = ^e l + ---+e P <vzi + ---+tp = i- 

It therefore follows easily that 



n (pK) 2 / log n £^£ n £n - Zn^nZr. 



and 



n (pKy 2 I log n CSsam£„ - i'n^nin 



as n — > oo 



as n — > oo. □ 



APPENDIX 

Throughout the paper, we denote by H the n x n hat matrix X' (XX') 1 X, which 
is symmetric and positive semidefinite with probability one by assumption (A). 



Lemma 1 (Basic facts). 
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(i) For any q x r matrix Ai and r x q matrix A 2 , we /ictwe 

(A.l) |tr(AiA 2 )| < ||Ai|| ||A 2 || and ||AiA 2 || < ||Ai|| ||A 2 || . 

In particular, for any q x r matrix A\ and r x r symmetric matrix A 2 , we Ziawe 

(A.2) |tr (A i A 2 A' 1 )| < ||AiAi|| ||A 2 || and ||AiA 2 A' 1 || < ||AiAi|| ||A 2 ||. 

(ii) With probability one, the hat matrix H is idempotent with 
(A.3) tr (H 2 ) = tr (H) = K, 

and it satisfies 

(A.4) < tr (Hll'H) < K 1/2 n and < tr ["(Hll'H) 2 ] < Kn 2 . 



Proof. One can refer to Horn and Johnson (1990) for standard proofs of (jA.ip and 
(|A.2|) . The fact that the hat matrix H is idempotent with (|A.3[) is known in multivariate 
statistical analysis. Clearly, tr (Hll'H) = l'Hl > 0. Thus by ([ATT]) and ([TO]) , we have 



tr (Hll'H) = tr (Hll') < ||H|| ||ll'|| = K x l'' 



n 



and 



tr 



(mi'Hy 



tr 



(Hll')' 



< llHll'll 2 < IIHII 2 Hll'H 2 = Kn 2 . 



This completes the proof. □ 



The main trick in the proofs of the technical lemmas below is conditioning on X and 
resorting to the basic facts from Lemma 1. 

Lemma 2. Under conditions (A) and (B), we have 

(A.5) E tr{[B56V(f)C; + C„c6V(f)B'] 2 } < ||B'B|| 0{n~ l pK z / 2 ). 

PROOF. It follows from ([A.ip that 

E tr { [BcoV(f)C^ + C n coV(f)B'] 2 } < 2 (n - 1)~ 2 E tr { [BXE' + EX'B'] 2 | 
+ 2n~ 2 (n - 1)~ 2 E tr j [BXll'HE' + EHll'X'B'] 2 } 

= 2 (n - 1)~ 2 Ai + 2n~ 2 (n - 1)~ 2 A 2 - 

We will consider the above two terms A\ and A2 separately. By c n = 0(1) in assumption 
(B), we can easily get \\E (ff ) || = 0{K) and E (||f|| 4 ) = 0{K 2 ). 
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£||BXll'HE'|| 2 = £tr 



Since E(E\f) = 0, by (|A"T|) and (|AT2j) conditioning on X results in 

i£ ||BXE'|| 2 = £ tr [BX£ (E'E|X) X'B'] = E tr [BX tr (E ) J„ X'B'] 
= n tr (S ) £ tr [Bff B'] = n tr (S ) tr [B£7 (ff ) B'] 
< n tr (S ) ||B'B|| ||£ (ff ) || = ||B'B|| tr (£„) O(ni^). 

Similarly, by conditioning on X we have 

BXll'HS (E'E|X) Hll'X'B' 
= E tr [BXll'H tr (E ) I n Hll'X'B'] , 

and then applying f|A.l|) and ()A.2p in Lemma 1 gives 

£||BX11'HE'|| 2 < tr(S ) s| ||B'B|| ||X'X|| || ll'll'H ||H|| J 

< jfVv || B / B || tr (Sq) (j| X 'x|| 2 ) } 1/2 . 

Note that E (j|X'X|| 2 ) = n£ (j|f|| 4 ) + n (n - 1) \\E (ff ) || 2 = 0(n 2 K 2 ). Thus, 

£||BX11'HE'|| 2 < ||B'B||tr(So)0(n 3 A' 3 / 2 ). 
Therefore, by (jA.ip we have 

< 4£ ||BXE'|| 2 < ||B'B|| tr (S ) 0{nK) 

and 

A 2 < 4:E ||BX11'HE'|| 2 < ||B'B|| tr (S ) 0(n 3 K 3 ^ 2 ), 
which together yield (|A.5|) since clearly tr (So) = 0(p). □ 

Lemma 3. Under conditions (A) and (B), we have 

(A.6) E tr {[C n cw(f)C;] 2 } = 0{n~ 2 p 2 K) 
and 

(A.7) E tr (F 2 n ) = 0(n~ l pK) + 0{n~ 2 p 2 K). 

PROOF. The proofs of (fATBl and (|AT7|) are similar to those in Lemmas 4 and 5 
below, respectively. For brevity, we omit them here. □ 

Lemma 4. Under conditions (A)-(C), we have 
(A.8) E ||C n cov(f)Cy||, = 0(n~ 2 pK). 
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Proof. Note that 



E ||C n c6v(f)C^||£ < 2 (n - 1)~ 2 E ||EHE'||^ + ItT 2 (n - 1)~ 2 E HEHll'HE'H^ 

= 2 (n - l) -2 /Ci + 2n~ 2 (n - 1)~ 2 /C 2 . 

We will consider the above two terms IC\ and IC2 separately. First, we study the term 
JCi, which can further be decomposed into four terms. Since E (e|f) = 0, by conditioning 
on X we have 



E ( H, ; £,£^ 1 ^ HwefceJS-^X 

i,i=i fc,/=l 



/Ci =p~ 1 -E tr 



where 

£1 = J Etr|^(H ll ) 2 ^ 



, £ 2 = £ tr <j ^HuUjjE (e^E^e^E- 1 ) ) , 



C 3 = EtT\J2( H ij) 2E [( e i e 'j' E 

i+3 



, £ 4 = E tr 



£ HijHjiE (e^E 1 ej£ , i 'E 1 ) 



and H« is the (i,j)-entry of the n x n hat matrix H. Then we consider each of these 
four terms separately. By (jl.3p and assumptions (C) and (B), it is easy to see that 

-i\ 2 l 



(A.9) tr (S^ 1 ) = 0(p), IIS^ 1 !) = 0(p 1/2 ), and tr 
It follows from ([AH and (TO]) that 



EnE" 



0(p). 



C\<KE {tr 



if £ 



fei=l 



and 



= K H W) +*£ * * (s 2 ) 

i=l 

+ 2Aj>(, 2 ) (S- 1 )..^^) (H-% 
< [tr (E^ 1 )] 2 O(K) + tr (SoE^EoS^ 1 ) O(A) = 0(p 2 K) 



C 2 = E< [ ^HiiH^- tr [E (ee') V l E (ee') E" 1 ] 



•[(E S- 1 ) 2 ] J E{[tr(H)] 2 } = O(pA- 2 ). 
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Similarly, we have 

£3 < K tr (erj'-E- 1 ) 2 } = K 



E 



(S % ?7i£j ^ (5] % r] k ei 



fc,/=l 



i=l i^j 
< ||S- 1 || 2 0(i^)+tr(S 5]~ 1 5]oS~ 1 )0(A") = 0{pK) 



j 1 



and 



£4 < A tr [E (e^'ST^e'S -1 )] = A [£ (e'S^e)] 2 = if 
= K[tr(V- 1 )0(l)] 2 = 0(p 2 K), 



i=l 



where r] = (rji, ■ ■ ■ ,r) p )' is an independent copy of e = (ei, • • • , %)'• Since A' < p by 
assumption (A), combining C\, £2, £3, and £4 together gives 



(A.10) 



/Ci = A||EHE'|| 2 = O(pA). 



Now we consider the second term IC2- By (1A.4|1 . the same calculation as above 
applies to show that 

K 2 = E llEHll'HE'Hl = 0(n 2 pK). 
Therefore, combining the above results together yields (|A.8|) . □ 

Lemma 5. Under conditions (A)-(C), we have 

(A.ll) E\\F n \\l = 0(n- 1 )+0(n~ 2 pK). 

Proof. Note that 

-BllFnlll < 2E \\l p o n -1 EE' - £ ||| + 2n~ 2 E \\l p o EHE'||^ . 

Since E (e) = and cov (e|f) = So, we have 

£?||/pon -1 EE' - So||s = P~ lE n ^~ 1/2 {I P EE ') s ~ 1/2 - S" 1/2 S S^ 1/2 



p 1 n 1 



S-^diag (£?,-■■ ,e 2 p ) S- 



1/2 



< p n 1 S tr 



^ — 1 — 
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It follows from (fAUj) that 



i,j=l i=l i^j 



\V- 1 \\ 2 0(l) = 0(p), 



which shows that E\\l p o n 1 EE / — Eo|| s 



0(n 1 ). The argument proving (lA.lOj) in 



Lemma 4 applies to show that 



£||/„oehe'| 



Hence, combining the above results together gives (|A.11|) . D 

Lemma 6. Under conditions (A)-(C), we have 

(A.12) £||G„|| 2 S = 0{n- l p). 

Proof. Recall that G n = |(n - l)" 1 EE' - [n(n - l)]^ 1 Ell'E'} - E , as defined 
in part (2) of the proof of Theorem 1. Note that 

£||Gn||| < 3£ Hn-^E' - Soils + 3n ~ 2 ( n - ^ E !l EE 1ls 
+ 3n~ 2 (n - l)" 2 E ||Ell'E'||| . 

From the proofs of C\ and £4 in Lemma 4, we know that 

E {tr [(ee'E" 1 ) 2 ] } = 0{p 2 ) and E [tr (er/IT V'S" 1 )] = 0{p 2 ), 

where r/ = (771, • • • , 77^)' is an independent copy of e = (ei, • • • , e p )' . Thus, we have 

E\\n- 1 EE'-V \\l=p- 1 E n-^-VSEE'E- 1 ^ _ S -V2 EoS -i/2 ' 



< p^n^E 



E-VSee's-i/a 
Similarly, it follows that 



P 



tr 



-n 2 



}=0{n~ l p). 



£||EE'||* =E II (£!,-•• ,e n ) (ei,-- - ,e n )'||' < np^£ 



S" 1 / 2 ee'S- 1 / 2 



tr 



fee's" 



} = 0(np) 



and 



£||Ell'E'|| 2 < np~ x E 



+ n{n- l)p 1 E 



^e^y'S- 1 / 2 



< np 



tr 



fee'S 



} + n(n - 1) p" 1 ^ [tr (er/E^r/e'E" 1 )] = 0(n 2 p). 



Therefore, combining the above results together proves (lA.12p . □ 
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